Method for analyzing music using sounds instruments

ABSTRACT

A method for analyzing digital-sounds using sound-information of instruments and/or score-information is provided. Particularly, sound-information of instruments which were used or which are being used to generate input digital-sounds is used. Alternatively, in addition to the sound-information, score-information which were used or which are being used to generate the input digital-sounds is also used. According to the method, sound-information including pitches and strengths of notes performed on instruments used to generate the input digital-sounds is stored in advance so that monophonic or polyphonic pitches performed on the instruments can be easily analyzed. Since the sound-information of instruments and the score-information are used together, the input digital-sounds can be accurately analyzed and output as quantitative data.

This application is the national phase under 35 U.S.C. § 371 of PCTInternational Application No. PCT/KR01/02081 which has an InternationalFiling Date of Dec. 3, 2001, which designated the United States ofAmerica.

TECHNICAL FIELD

The present invention relates to a method for analyzingdigital-sound-signals, and more particularly to a method for analyzingdigital-sound-signals by comparing frequency-components of inputdigital-sound-signals with frequency-components ofperforming-instruments'-sounds.

BACKGROUND ART

Since personal computers started to be spread in 1980's, technology,performance and environment of computers have been rapidly developed. In1990's, the Internet was rapidly applied to various fields of companiesand personal lives. Therefore, computers are going to be very importantin every field throughout the world in the 21st century. One of thecomputer music applications is musical instrument digital interface(MIDI). MIDI is a representative computer music technique used bymusicians to synthesize and/or store musical sounds of instruments orvoices. At present, MIDI is a technique mainly used by popular musiccomposers or players.

For example, composers can easily compose music using computersconnected to electronic MIDI instruments, and computers or synthesizerscan easily reproduce the composed MIDI music. In addition, soundsproduced using MIDI equipments can be mixed with vocals in studios to berecreated as a popular song having support of the public.

The MIDI technique has been developed in combination with popular musicand has been entered to musical education field. In other words, MIDIuses only simple musical-information like instrument-types, notes,notes'-strength, onset and offset of notes regardless of the actualsounds of musical performance so that MIDI data can be easily exchangedbetween MIDI instruments and computers. Accordingly, the MIDI datagenerated by electronic-MIDI-pianos can be utilized in musical educationusing computers, which are connected to those electronic-MIDI-pianos.Therefore, many companies including Yamaha in Japan develop musicaleducation software using MIDI.

However, the MIDI technique does not satisfy the desires of mostclassical musicians treasuring sounds of acoustic instruments andfeelings arising when playing acoustic instruments. Because most of theclassical musicians do not like the sounds and feelings of electronicinstruments, they study music through traditional methods and learn howto play acoustic instruments. Accordingly, music teachers and studentsteach and learn classical music in academies of music or schools ofmusic, and there is no other way for students but to fully depend onmusic teachers. In this situation, it is desired to apply computertechnology and digital signal processing technology to the field ofclassical music education so that the music performed on acousticinstruments can be analyzed and the result of analysis can be expressedby quantitative performance information.

For this, digital sound analysis technology, which digital sounds areconverted from the performing sounds on acoustic instruments, has beendeveloped using computers in various viewpoints.

For example, the method of using score information to extract MIDI datafrom recorded digital sounds is disclosed in a master's thesis entitled“Extracting Expressive Performance Information from Recorded Music,”written by Eric D. Scheirer. This thesis relates to extracting of thenotes'-strength, onset timing, offset timing of each note and convertingthe extracted information into MIDI data. However, referring to theresults of experiments described in the thesis, onset timings wereaccurately extracted from recorded digital sounds to some extent, butextraction of offset timings and notes'-strength of notes wereinaccurate.

Meanwhile, several small companies in the world have put initialproducts that can analyze simple digital sounds using a musicrecognition technique on the market. According to the officialalt.music.midi newsgroup FAQ (frequently asked questions), which is onthe Internet page http://home.sc.rr.com/cosmogony/ammfaq.html, there aresome products to convert wave files into MIDI data or score data byanalyzing the digital sounds in wave files. The products include AkoffMusic Composer, Sound2MIDI, Gama, WIDI, Digital Ear, WAV2MID, PolyaxeDriver, WAV2MIDI, IntelliScore, PFS-System, Hanauta Musician, Audio toMIDI, AmazingMIDI, Capella-Audio, AutoScore, and most recently publishedWaveGoodbye.

Some of these products are advertised as being able to analyzepolyphonic-sounds. However, it was found that they could not analyzepolyphonic-sounds as a result of experiments. For this reason, the FAQdocument describes that the reproduced MIDI sounds cannot be heard justlike the original sounds after the sounds have been converted into MIDIformat. Moreover, the FAQ document plainly states that all softwarepublished at present for converting wave files into MIDI files are of noworth.

The following description concerns the result of the experiment onAmazingMIDI by Araki Software to find how it analyzes polyphonic-soundsin a wave file.

FIG. 1 is a piece of musical score used in the experiment and showsfirst two measures of the second movement in Beethoven's Piano SonataNo. 8. In FIG. 2, the score is divided in units of monophonic notes forconvenience of analysis, and the note names are assigned to theindividual notes. FIG. 3 shows a parameter setting window on which auser sets parameters for converting a wave file into a MIDI file inAmazingMIDI. FIG. 4 is a window showing the converted MIDI data obtainedwhen all parameter control bars are fixed at the right-most ends ofcontrol sections. FIG. 5 shows the expected original notes based on thescore of FIG. 2 using black bars on the MIDI window of FIG. 4. FIG. 6 isanother MIDI window showing the converted MIDI data obtained when allthe parameter control bars are fixed at the left-most ends of thecontrol sections. FIG. 7 shows the expected original notes using blackbars on the MIDI window of FIG. 6, like FIG. 5.

Referring to FIGS. 1 and 2, three notes C4, A3♭, and A2♭ initiallystart. Then, in a state where piano keys corresponding to the notes C4and A2♭ are pressed, keys corresponding to notes E3♭, A3♭, and E3♭ aresequentially pressed. Next, a note B3♭ follows the note C4, andsimultaneously, notes D3♭ and G3 follows the notes A2♭ and E3♭,respectively. Then, in a sate where keys corresponding to the notes B3♭and D3♭ are pressed, keys corresponding to notes E3♭, G3, and E3♭ aresequentially pressed. Accordingly, when this wave file based on thescore is converted to MIDI data, MIDI data must be configured asexpressed by black bars shown in FIG. 5. However, in the realexperiment, MIDI data was configured as shown in FIG. 4.

Referring to FIG. 3, AmazingMIDI allows a user to set various parametersfor converting wave files into MIDI files. Configuration of the MIDIdata varied with the set values of these parameters very much. When thevalues of Minimum Analysis, Minimum Relative, and Minimum Note were setto the right-most values on the parameter input window of FIG. 3, MIDIdata resulting from conversion was obtained as shown in FIG. 4. Whenthese values were set to the left-most values, MIDI data resulting fromconversion was obtained as shown in FIG. 6. When FIG. 4 is compared withFIG. 6, it can be seen that there is a lot of difference between them.In other words, only frequencies having large magnitudes in a frequencydomain were recognized and expressed in the form of MIDI in FIG. 4, butfrequencies having small magnitudes were recognized and expressed in theform of MIDI in FIG. 6. Accordingly, MIDI data shown in FIG. 6 basicallycontains MIDI data of FIG. 4.

When compared with FIG. 5, FIG. 4 shows that the notes A2♭, E3♭, G3, andD3♭ were not recognized at all, and recognition of the notes C4, A3♭,and B3♭ was very different from actual performance based on the score ofFIG. 2. In detail, in the case of the note C4, recognized length is onlyinitial 25% of original length. In the case of the note B3♭, recognizedlength is less than 20% of original length. In the case of the note A3♭,recognized length is only 35% of original length. Moreover, many notesthat were not performed were recognized. A note E4♭ was recognized withloud notes'-strength, and unperformed notes A4♭, G4, B4♭, D5, and F5were wrongly recognized.

When compared with FIG. 7, FIG. 6 shows that although the notes A2♭,E3♭, G3, D3♭, C4, A3♭, and B3♭ that were actually performed were allrecognized, recognized notes were very different from the performednotes. In other words, the actual sounds of the notes C4 and A2♭ werecontinued since the keys were maintained pressed, but the notes C4 andA2♭ were recognized as being stopped at least one time. In the case ofthe notes A3♭ and E3♭, recognized onset timings and note lengths werevery different from actually performed ones. In FIGS. 6 and 7, many graybars show in addition to black bars. The gray bars indicate notes thatwere wrongly recognized although they were not actually performed. Thesewrongly recognized gray bars are more than correctly recognized bars.Although the results of experiments on programs other than AmazingMIDIprogram will not be described in this specification, it was proved thatthe results of experiments on all published programs for recognizingmusic were similar to the result of the experiment on AmazingMIDIprogram and were not satisfactory.

Although techniques of analyzing music performed on acoustic instrumentsusing computer technology and digital signal processing technology havebeen developed in various viewpoints, satisfactory results have neverbeen obtained.

DISCLOSURE OF THE INVENTION

Accordingly, the present invention aims at providing a method foranalyzing music using sound-information previously stored with respectto the instruments used in performance so that the more accurate resultof analyzing the performance can be obtained and the result can beextracted in the form of quantitative data.

In other words, it is a first object of the present invention to providea method for analyzing music by comparing components contained indigital-sounds with components contained sound-information of musicalinstruments and analyzing the components so that polyphonic pitches aswell as monophonic pitches can be accurately analyzed.

It is a second object of the present invention to provide a method foranalyzing music using sound-information of musical instruments andscore-information of the music so that the accurate result of analysiscan be obtained and time for analyzing music can be reduced.

To achieve the first object of the present invention, there is provideda method for analyzing music using sound-information of musicalinstruments. The method includes the steps of (a) generating and storingsound-information of different musical instruments; (b) selecting thesound-information of a particular instrument to be actually played fromamong the stored sound-information of different musical instruments; (c)receiving digital-sound-signals; (d) decomposing thedigital-sound-signals into frequency-components in units of frames; (e)comparing the frequency-components of the digital-sound-signals with thefrequency-components of the selected sound-information, and analyzingthe frequency-components of the digital-sound-signals to detectmonophonic-pitches-information from the digital-sound-signals; and (f)outputting the detected monophonic-pitches-information.

To achieve the second object of the present invention, there is provideda method for analyzing music using sound-information of musicalinstruments and score-information. The method includes the steps of (a)generating and storing sound-information of different musicalinstruments; (b) generating and storing score-information of a score tobe performed; (c) selecting the sound-information of a particularinstrument to be actually played and score-information of a score to beactually performed from among the stored sound-information of differentmusical instruments and the stored score-information; (d) receivingdigital-sound-signals; (e) decomposing the digital-sound-signals intofrequency-components in units of frames; (f) comparing thefrequency-components of the digital-sound-signals with thefrequency-components of the selected sound-information and the selectedscore-information, and analyzing the frequency-components of thedigital-sound-signals to detect performance-error-information andmonophonic-pitches-information from the digital-sound-signals; and (g)outputting the detected monophonic-pitches-information and/or thedetected performance-error-information.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of a score corresponding to the first two measuresof the second movement in Beethoven's Piano Sonata No. 8.

FIG. 2 is a diagram of a score in which polyphonic-notes in the scoreshown in FIG. 1 are divided into monophonic-notes.

FIG. 3 is a diagram of a parameter-setting-window of AmazingMIDIprogram.

FIG. 4 is a diagram of one result of converting actual performed notesof the score shown in FIG. 1 into MIDI data using AmazingMIDI program.

FIG. 5 is a diagram in which the actual performed notes are expressed asblack bars on FIG. 4.

FIG. 6 is a diagram of another result of converting actual performednotes of the score shown in FIG. 1 into MIDI data using AmazingMIDIprogram.

FIG. 7 is a diagram in which the actual performed notes are expressed asblack bars on FIG. 6.

FIG. 8 is a conceptual diagram of a method for analyzing digital-sounds.

FIGS. 9A through 9E are diagrams of examples of piano sound-informationused to analyze digital sounds.

FIG. 10 is a flowchart of a process for analyzing input digital-soundsbased on sound-information of different kinds of instruments accordingto a first embodiment of the present invention.

FIG. 10A is a flowchart of a step of detectingmonophonic-pitches-information from the input digital-sounds in units ofsound frames based on the sound-information of different kinds ofinstruments according to the first embodiment of the present invention.

FIG. 10B is a flowchart of a step of comparing frequency-components ofthe input digital-sounds with frequency-components of sound-informationof a performed instrument in frame units and analyzing thefrequency-components of the digital-sounds based on thesound-information of different kinds of instruments according to thefirst embodiment of the present invention.

FIG. 11 is a flowchart of a process for analyzing input digital-soundsbased on sound-information of different kinds of instruments andscore-information according to a second embodiment of the presentinvention.

FIG. 11A is a flowchart of a step of detectingmonophonic-pitches-information and performance-error-information fromthe input digital-sounds in units of frames based on thesound-information of different kinds of instruments and thescore-information according to the second embodiment of the presentinvention.

FIGS. 11B and 11C are flowcharts of a step of comparingfrequency-components of the input digital-sounds withfrequency-components of the sound-information of a performed instrumentin frame units and analyzing the frequency-components of thedigital-sounds based on the sound-information and the score-informationaccording to the second embodiment of the present invention.

FIG. 11D is a flowchart of a step of adjusting theexpected-performance-value based on the sound-information of differentkinds of instruments and the score-information according to the secondembodiment of the present invention.

FIG. 12 is a diagram of the result of analyzing the frequency-componentsof the sound of a piano played according to the first measure of thescore shown in FIGS. 1 and 2.

FIGS. 13A through 13G are diagrams of the results of analyzing thefrequency-components of the sounds of individual notes performed on apiano, which are contained in the first measure of the score.

FIGS. 14A through 14G are diagrams of the results of indicating thefrequency-components of each of the notes contained in the first measureof the score on FIG. 12.

FIG. 15 is a diagram in which the frequency-components shown in FIG. 12are compared with the frequency-components of the notes contained in thescore of FIG. 2.

FIGS. 16A through 16D are diagrams of the results of analyzing thefrequency-components of the notes, which are performed according to thefirst measure of the score shown in FIGS. 1 and 2, by performing fastFourier transform (FFT) using FFT windows of different sizes.

FIGS. 17A and 17B are diagrams showing time-errors occurring duringanalysis of digital-sounds, which errors vary with the size of an FFTwindow.

FIG. 18 is a diagram of the result of analyzing the frequency-componentsof the sound obtained by synthesizing a plurality of pieces ofmonophonic-pitches-information detected using sound-information and/orscore-information according to the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, a method for analyzing music according to the presentinvention will be described in detail with reference to the attacheddrawings.

FIG. 8 is a conceptual diagram of a method for analyzing digital sounds.Referring to FIG. 8, the input digital-sound signals are analyzed (80)using musical instrument sound-information 84 and input musicscore-information 82, and as a result, performance-information,accuracy, MIDI data, and so on are detected, and an electronic-score isdisplayed.

Here, digital-sounds include anything in formats such as PCM waves, CDaudios, or MP3 files in which input sounds are digitized and stored sothat computers can process the sounds. Music that is performed in realtime can be input through a microphone connected to a computer andanalyzed while being digitized and stored.

The input score-information 82 includes note-information,note-length-information, speed-information (e.g., =64, and fermata ( )),tempo-information (e.g., 4/4), note-strength-information (e.g., forte,piano, accent (>), and crescendo ( )), detailed performance-information(e.g., staccato, staccatissimo, and pralltriller), and information fordiscriminating the staves for left hand from the other staves for righthand in the case where both hands are used for performing music on, forexample, piano. In addition, in the case where at least two instrumentsare used, information about the staves for each instrument is included.In other words, all information on a score which people applies toperform music on musical-instruments can be used as score-information.Since notation is different among composers and ages, detailed notationwill not be described in this specification.

The musical-instrument sound-information 84 is previously constructedfor each of the instruments used for performance, as shown in FIGS. 9Athrough 9E, and includes information such as pitch, note strength, andpedal table. This will be further described later with reference toFIGS. 9A through 9E.

As shown in FIG. 8, in the present invention, sound-information or bothsound-information and score-information are utilized to analyze inputdigital-sounds. The present invention can accurately analyze the pitchand strength of each note even if many notes are simultaneouslyperformed as in piano music and can detect performance-informationincluding which notes are performed at what strength from the analyzedinformation in each time slot.

To analyze input digital-sounds, sound-information ofmusical-instruments is used because each musical-note has an inherentpitch-frequency and inherent harmonic-frequencies, and pitch-frequenciesand harmonic-frequencies are basically used to analyze performancesounds of acoustic-instruments and human-voices.

Different types of instruments usually have differentpeak-frequency-components (pitch-frequencies and harmonic-frequencies).Accordingly, it is possible to analyze digital-sounds by comparing thepeak-frequency-components of the digital-sounds with thepeak-frequency-components of different types of instruments that arepreviously detected and stored as sound-information by the types ofinstruments.

For example, if sound-information of 88 keys of a piano is previouslydetected and stored, even if different notes are simultaneouslyperformed on the piano, the sounds of simultaneously performed notes canbe compared with combinations of 88 sounds previously stored as soundinformation. Therefore, each of the simultaneously performed notes canbe accurately analyzed.

FIGS. 9A through 9E are diagrams of examples of piano sound-informationused to analyze digital-sounds. FIGS. 9A through 9E show examples ofsound-information of 88 keys of a piano made by Young-chang.

FIGS. 9A through 9C show the conditions used for detectingsound-information of the piano. FIG. 9A shows the pitches A0 through C8of the respective 88 keys. FIG. 9B shows note strength identificationinformation. FIG. 9C shows identification information indicating whichpedals are used. Referring to FIG. 9B, the note strengths can beclassified into predetermined levels from “−∞” to “0”. Referring to FIG.9C, the case where a pedal is used is expressed by “1”, and the casewhere a pedal is not used is expressed by “0”. FIG. 9C shows all casesof use of three pedals of the piano.

FIGS. 9D and 9E show examples of the actual formats in which thesound-information of the piano is stored. FIGS. 9D and 9E showsound-information with respect to the case where the note is C4, thenote strength is −7 dB, and no pedals are used under the conditions ofsound-information shown in FIGS. 9A through 9C. Specifically, FIG. 9Dshows the sound-information stored in wave format, and FIG. 9E shows thesound-information stored in frequency format, spectrogram. Here, aspectrogram shows the magnitudes of individual frequencies in a temporaldomain. The horizontal axis of the spectrogram indicates timeinformation, and the vertical axis thereof indicates frequencyinformation. Referring to a spectrogram as shown in FIG. 9E,frequency-components' magnitudes can be obtained at each time.

In other words, when the sound-information of each musical-instrument isstored in the form of samples of sounds having at least one strength,sounds of each note can be stored as the sound information in waveforms, as shown in FIG. 9D, so that frequency-components can be detectedfrom the waves during analysis of digital-sounds, or the magnitudes ofindividual frequency-components can be directly stored as thesound-information, as shown in FIG. 9E.

In order to directly express the sound-information of eachmusical-instrument as the magnitudes of individual frequency-components,frequency analysis methods such as Fourier transform or wavelettransform can be used.

If a string-instrument, for example a violin, is used as amusical-instrument, sound-information can be classified by differentstrings for the same notes and stored.

Such sound-information of each musical-instrument can be periodicallyupdated according to a user's selection, considering the fact thatsound-information of the musical-instrument can vary with the lapse oftime or with circumstances such as temperature.

FIGS. 10 through 10B are flowcharts of a method of analyzingdigital-sounds according to a first embodiment of the present invention.The first embodiment of the present invention will be described indetail with reference to the attached drawings.

FIG. 10 is a flowchart of a process for analyzing input digital-soundsbased on sound-information of different kinds of instruments accordingto the first embodiment of the present invention. The process foranalyzing input digital-sounds based on sound-information of differentkinds of instruments according to the first embodiment of the presentinvention will be described with reference to FIG. 10.

After sound-information of different kinds of instruments is generatedand stored (not shown), sound-information of the instrument for actualperformance is selected in step s100. Here, the sound-information ofdifferent kinds of instruments is stored in formats as shown in FIGS. 9Athrough 9E.

Next, if digital-sound-signals are input in step s200, thedigital-sound-signals are decomposed into frequency-components in unitsof frames in step s400. The frequency-components of thedigital-sound-signals are compared with the frequency-components of theselected sound-information and analyzed to detectmonophonic-pitches-information from the digital-sound-signals in unitsof frames in step s500. The detected monophonic-pitches-information isoutput in step s600.

The steps s200 and s400 through s600 are repeated until the inputdigital-sound-signals are stopped or an end command is input in steps300.

FIG. 10A is a flowchart of the step s500 of detectingmonophonic-pitches-information from the input digital-sounds in units ofsound frames based on the sound-information of different kinds ofinstruments according to the first embodiment of the present invention.FIG. 10A shows a procedure for detecting monophonic-pitches-informationwith respect to a single current-frame. Referring to FIG. 10A,time-information of a current-frame is detected in step s510. Thefrequency-components of the current-frame are compared with thefrequency-components of the selected sound-information and analyzed todetect current pitch and strength information of each ofmonophonic-notes in the current-frame in step s520. In step s530,monophonic-pitches-information is detected from the currentpitch-information, note-strength-information and time-information.

If it is determined that current pitch in the detectedmonophonic-pitches-information is a new-pitch that is not included inthe previous frame in step s540, the current-frame is divided into aplurality of subframes in step s550. A subframe including the new-pitchis detected from among the plurality of subframes in step s560.Time-information of the detected subframe is detected s570. Thetime-information of the new-pitch is updated with the time-informationof the subframe in step s580. The steps s540 through s580 can be omittedwhen the new-pitch is in a low frequency range, or when the accuracy oftime-information is not required.

FIG. 10B is a flowchart of the step s520 of comparing the frequencycomponents of the input digital-sounds with the frequency-components ofthe sound-information of the performed instrument in frame units andanalyzing the frequency-components of the digital-sounds based on thesound-information of different kinds of instruments according to thefirst embodiment of the present invention.

Referring to FIG. 10B, the lowest peak frequency-components contained inthe current frame is selected in step s521. Next, the sound-information(S_CANDIDATES) containing the selected peak frequency-components isdetected from the sound-information of the performed instrument in steps522. In step s523, the sound-information (S_DETECTED) having mostsimilar peak-frequency-components to the selectedpeak-frequency-components is detected as monophonic-pitches-informationfrom the sound-information (S_CANDIDATES) detected in step s522.

If the monophonic-pitches-information corresponding to the lowest peakfrequency-components is detected, the lowest peak frequency-componentsare removed from the frequency-components contained in the current-framein step s524. Thereafter, it is determined whether there are any peakfrequency-components in the current-frame in step s525. If it isdetermined that there is any, the steps s521 through s524 are repeated.

For example, in the case where three notes C4, E4, and G4 are containedin the current-frame of the input digital-sound-signals, the referencefrequency-components of the note C4 is selected as the lowest peakfrequency-components from among peak frequency-components contained inthe current-frame in step s521.

Next, the sound-information (S_CANDIDATES) containing the referencefrequency-component of the note C4 is detected from thesound-information of the performed instrument in step s522. Here,generally, sound-information of the note C4, sound-information of a noteC3, sound-information of a note G2, and so on can be detected.

Then, in step s523, among the several sound-information (S_CANDIDATES)detected in step of s522, the sound-information (S_DETECTED) of C4 isselected as monophonic-pitches-information because of the highresemblance of the selected peak frequency-components.

Thereafter, the frequency-components of the detected sound-information(S_DETECTED) (i.e., the note C4) are removed from frequency-components(i.e., the notes C4, E4, and G4) contained in the current-frame of thedigital-sound-signals in step s524. Then, the frequency-componentscorresponding to the notes E4 and G4 remain in the current-frame. Thesteps s521 through s524 are repeated until there are nofrequency-components in the current-frame. Through the above steps,monophonic-pitches-information with respect to all of the notescontained in the current-frame can be detected. In the above case,monophonic-pitches-information with respect to all of the notes C4, E4,and G4 can be detected by repeating the steps s521 through s524 threetimes.

Hereinafter, a method for analyzing digital-sounds usingsound-information according to the present invention will be describedbased on the following pseudo-code 1. Refer to conventional methods foranalyzing digital-sounds for a part of [Pseudo-code 1] which is notdescribed.

[Pseudo-code 1] line 1 input of digital-sound-signals (das) line 2 //division of the das into frames considering the size of a n FFT //window and a space between FFT windows (overlap is // permitted) line 3frame = division of das into frames (das, fft-size, overlap-size) line 4for all frames line 5 x = fft (frame) // Fourier transform line 6 peak =lowest peak frequency components (x) line 7 timing = time information ofa frame line 8 while (peak exist) line 9 candidates = sound informationcontains (peak) line 10 sound = most similar sound information(candidates, x) line 11 if sound is new pitch line 12 subframe =division of the frame into subframes (frame, sub-size, overlap size)line 13 for all subframes line 14 subx = fft (subframe) line 15 if subxincludes the peak line 16 timing = time information of a subframe line17 exit-for line 18 end-if line 19 end-for line 20 end-if line 21 result= new result of analysis (result, timing, sound) line 22 x = x − soundline 23 peak = lowest peak frequency components (x) line 24 end-whileline 25 end-for line 26 performance = correction by instrument types(result)

Referring to [Pseudo-code 1], digital-sound-signals are input in line 1and are divided into frames in line 3. Each of the frames is analyzed byrepeating a for-loop in lines 4 through 25. Frequency-components arecalculated through Fourier transform in line 5, and the lowest peakfrequency-components are selected in line 6. Subsequently, in line 7,time-information of a current-frame to be stored in line 21 is detected.The current-frame is analyzed by repeating a while-loop while peakfrequency-components exist in lines 8 through 24. Sound-information(candidates) containing the peak frequency-components of thecurrent-frame is detected in line 9. Peak frequency-components containedin the current-frame are compared with those contained in the detectedsound-information (candidates) to detect sound-information (sound)containing most similar peak frequency-components to those contained inthe current-frame in line 10. Here, the detected sound-information isadjusted to a strength the same as the strength of the peak-frequency ofthe current-frame. If it is determined that a pitch corresponding to thesound-information detected in line 10 is new one which is not containedin the previous frame in line 11, the size of an FFT window is reducedto extract accurate time information.

To extract the accurate time-information, the current-frame is dividedinto a plurality of subframes in line 12, and each of the subframes isanalyzed by repeating a for-loop in lines 13 through 19.Frequency-components of a subframe are calculated through Fouriertransform in line 14. If it is determined that the subframe contains thelowest peak frequency-components selected in line 6 in line 15,time-information corresponding to the subframe is detected in line 16 tobe stored in line 21. The time-information detected in line 7 has alarge time error in the time-information since a large-size FFT windowis applied. However, the time-information detected in line 16 has asmall time error in the time-information since a small-size FFT windowis applied. Because the for-loop from line 13 to line 19 exits in line17, not the time-information detected in line 7 but the more accuratetime-information detected in line 16 is stored in line 21.

As described above, when it is determined that a pitch is new, the sizeof a unit frame is reduced to detect accurate time-information in lines11 through 20. As well as the time-information, the pitch-informationand the strength-information of the detected pitch are stored in line21. The frequency-components of the sound-information detected in line10 is subtracted from the current-frame in line 22, and the next lowestpeak frequency-components are searched in line 23 again. The aboveprocedure from line 9 to line 20 is repeated, and the result ofanalyzing the digital-sound-signals is stored as a result-variable(result) in line 21.

However, the stored result (result) is insufficient to be used asinformation of actually performed music. In the case of a piano, when apitch is performed by pressing a key, the pitch is not represented by anaccurate frequency-components during an initial stage, onset.Accordingly, the pitch can be usually analyzed accurately only after atleast one frame is processed. In this case, if it is considered that apitch performed on a piano does not change within a very short time (forexample, a time corresponding to three or four frames), more accurateperformance-information can be detected. Therefore, the result variable(result) is analyzed considering the characteristics of a correspondinginstrument and the result of analysis is stored as more accurateperformance-information (performance) in line 26.

FIGS. 11 through 11D are flowcharts of a method of analyzing digitalsounds according to a second embodiment of the present invention. Thesecond embodiment of the present invention will be described in detailwith reference to the attached drawings.

In the second embodiment, both sound-information of different kinds ofinstruments and score-information of music to be performed are used. Ifall available kinds of information according to changes infrequency-components of each pitch can be constructed assound-information, input digital-sound-signals can be analyzed veryaccurately. However, it is difficult to construct such sound-informationin an actual state. The second embodiment is provided considering theabove difficulty. In other words, in the second embodiment,score-information of music to be performed is selected so that nextinput notes can be predicted based on the score-information. Therefore,input digital-sounds are analyzed using the sound-informationcorresponding to the predicted notes.

FIG. 11 is a flowchart of a process for analyzing input digital-soundsbased on sound-information of different kinds of instruments andscore-information according to the second embodiment of the presentinvention. The process for analyzing input digital sounds based onsound-information of different kinds of instruments andscore-information according to the second embodiment of the presentinvention will be described with reference to FIG. 11.

After sound-information of different kinds of instruments andscore-information of music to be performed are generated and stored (notshown), sound-information of the instrument for actual performance andscore-information of music to be actually performed are selected amongstored sound-information and score-information in steps t100 and t200.Here, the sound-information of different kinds of instruments is storedin formats as shown in FIGS. 9A through 9E. Meanwhile, a method ofgenerating score-information of music to be performed is beyond thescope of the present invention. At present, there are many types oftechniques of scanning printed scores, converting the scanned scoresinto MIDI data, and storing the performance-information. Thus, adetailed description of generating and storing score-information will beomitted.

The score-information includes pitch-information, notelength-information, speed-information, tempo-information, notestrength-information, detailed performance-information (e.g., staccato,staccatissimo, and pralltriller), and discrimination-information forperformance using two hands or a plurality of instruments.

After the sound-information and score-information are selected in stepst100 and t200, if digital-sound-signals are input in step t300, thedigital-sound-signals are decomposed into frequency-components in unitsof frames in step t500. The frequency-components of thedigital-sound-signals are compared with the selected score-informationand the frequency-components of the selected sound-information of theperformed instrument and analyzed to detectperformance-error-information and monophonic-pitches-information fromthe digital-sound-signals in step t600. Thereafter, the detectedmonophonic-pitches-information is output in step t700.

Performance accuracy can be estimated based on theperformance-error-information in step t800. If theperformance-error-information corresponds to a pitch (for example, avariation) intentionally performed by a player, theperformance-error-information is added to the existing score-informationin step t900. The steps t800 and t900 can be selectively performed.

FIG. 11A is a flowchart of the step t600 of detectingmonophonic-pitches-information and performance-error-information fromthe input digital-sounds in units of frames based on thesound-information of different kinds of instruments and thescore-information according to the second embodiment of the presentinvention. FIG. 11A shows a procedure for detectingmonophonic-pitches-information and performance-error-information withrespect to a single current-frame. Referring to FIG. 11A,time-information of the current-frame is detected in step t610. Thefrequency-components of the current-frame are compared with thefrequency-components of the selected sound-information of the performedinstrument and with the score-information and analyzed to detect currentpitch and strength information of each of pitches in the current-framein step t620. In step t640, monophonic-pitches-information andperformance-error-information are detected from the detectedpitch-information, note strength-information and time-information.

If it is determined that current pitch in the detectedmonophonic-pitches-information is a new one that is not included in theprevious frame in step t650, the current-frame is divided into aplurality of subframes in step t660. A subframe including the new pitchis detected from among the plurality of subframes in step t670.Time-information of the detected subframe is detected t680. Thetime-information of the new pitch is updated with the time-informationof the subframe in step t690. Similar to the first embodiment, the stepst650 through t690 can be omitted when the new pitch is in a lowfrequency range, or when the accuracy of time-information is notrequired.

FIGS. 11B and 11C are flowcharts of the step t620 of comparingfrequency-components of the input digital-sounds withfrequency-components of the sound-information of a performed instrumentin frame units based on the score-information, and analyzing thefrequency-components of the digital-sounds based on thesound-information and the score-information according to the secondembodiment of the present invention.

Referring to FIGS. 11B and 11C, in step t621, anexpected-performance-value of the current-frame is generated referringto the score-information in real time, and it is determined whetherthere is any note in the expected-performance-value that is not comparedwith the digital-sound-signals in the current-frame.

If it is determined that there is no note in theexpected-performance-value which is not compared with thedigital-sound-signals in the current-frame in step t621, it isdetermined whether frequency-components of the digital-sound-signals inthe current-frame correspond to performance-error-information, andperformance-error-information and monophonic-pitches-information aredetected, and the frequency-components of sound-informationcorresponding to the performance-error-information and themonophonic-pitches-information are removed from thedigital-sound-signals in the current-frame, in steps t622 through t628.

More specifically, the lowest peak frequency-components of the inputdigital-sound-signals in the current-frame are selected in step t622.Sound-information containing the selected peak frequency-components isdetected from the sound-information of the performed instrument in stept623. Sound-information containing most similar peakfrequency-components to the frequency-components of the selected peakfrequency-components is detected from the sound-information detected instep t623 as performance-error-information in step t624. If it isdetermined that the current pitches of the performance-error-informationare contained in next notes in the score-information in step t625, thecurrent pitches of the performance-error-information are added to theexpected-performance-value in step t626. Next, the current pitches ofthe performance-error-information are moved into themonophonic-pitches-information in step t627. The frequency-components ofthe sound-information detected as the performance-error-information orthe monophonic-pitches-information in step t624 or t627 are removed fromthe current-frame of the digital-sound-signals in step t628.

If it is determined that there is any note in theexpected-performance-value which is not compared with thedigital-sound-signals in the current-frame in step t621, thedigital-sound-signals are compared with the expected-performance-valueand analyzed to detect monophonic-pitches-information from thedigital-sound-signals in the current-frame, and the frequency-componentsof the sound-information detected as the monophonic-pitches-informationare removed from the digital-sound-signals, in steps t630 through t634.

More specifically, sound-information of the lowest pitch which is notcompared with frequency-components contained in the current-frame of thedigital-sound-signals is selected from the sound-informationcorresponding to the expected-performance-value which has not undergonecomparison in step t630. If it is determined that thefrequency-components of the selected sound-information are included infrequency-components contained in the current-frame of thedigital-sound-signals in step t631, the selected sound-information isdetected as monophonic-pitches-information in step t632. Then, thefrequency-components of the selected sound-information are removed fromthe current-frame of the digital-sound-signals in step t633. If it isdetermined that the frequency-components of the selectedsound-information are not included in the frequency-components containedin the current-frame of the digital-sound-signals in step t631, theexpected-performance-value is adjusted in step t635. The steps t630through t633 are repeated until it is determined that every pitch in theexpected-performance-value has undergone comparison in step t634.

The steps t621 through t628 and t630 through t635 shown in FIGS. 11B and11C are repeated until it is determined that no peakfrequency-components are left in the digital-sound-signals in thecurrent-frame in step t629.

FIG. 11D is a flowchart of the step t635 of adjusting the expectedperformance value according to the second embodiment of the presentinvention. Referring to FIG. 11D, if it is determined that thefrequency-components of the selected sound-information are not includedin at least a predetermined-number (N) of consecutive previous frames instep t636, and if it is determined that the frequency-components of theselected sound-information are included in the digital-sound-signals atone or more time points in step t637, the notes corresponding to theselected sound-information are removed from theexpected-performance-value in step t639. Alternatively, if it isdetermined that the frequency-components of the selectedsound-information are not included in at least a predetermined number(N) of consecutive previous frames in step t636, and if it is determinedthat the frequency-components of the selected sound-information arenever included in the digital-sound-signals in step t637, the selectedsound-information is detected as the performance-error-information instep t638, and the notes corresponding to the selected sound-informationare removed from the expected-performance-value in step t639.

Hereinafter, a method for analyzing digital-sounds usingsound-information and score-information according to the presentinvention will be described based on the following pseudo-code 2.

[Pseudo-code 2] line 1 input of score information (score) line 2 inputof digital sound signals (das) line 3 frame = division of das intoframes (das, fft-size, overlap-size) line 4 current performance value(current) = previous performance value (prev) = NULL line 5 nextperformance value (next) = pitches to be initially performed line 6 forall frames line 7 x = fft (frame) line 8 timing = time information of aframe line 9 for all pitches (sound) in next & not in (current, prev)line 10 if sound is contained in the frame line 11 prev = prev + currentline 12 current = next line 13 next = pitches to be performed next line14 exit-for line 15 end-if line 16 end-for line 17 for all pitches(sound) in prev line 18 if sound is not contained in the frame line 19prev = prev − sound line 20 end-if line 21 end-for line 22 for allpitches (sound) in (current, prev) line 23 if sound is not contained inthe frame line 24 result = performance error (result, timing, sound)line 25 else // if sound is contained in the frame line 26 sound =adjustment of strength (sound, x) line 27 result = new result ofanalysis (result, timing, sound) line 28 x = x − sound line 29 end-ifline 30 end-for line 31 peak = lowest peak frequency (x) line 32 while(peak exist) line 33 candidates = sound information contains (peak) line34 sound = most similar sound information (candidates, x) line 35 result= performance error (result, timing, sound) line 36 x = x − sound line37 peak = lowest peak frequency components (x) line 38 end-while line 39end-for line 40 performance = correction by instrument types (result)

Referring to [Pseudo-code 2], in order to use both score-information andsound-information, first, score-information is received in line 1. Thispseudo-code is a most basic example of analyzing digital-sounds bycomparing information of each of performed pitches with thedigital-sounds using only note-information in the score-information.Score-information input in line 1 is used to detect anext-performance-value (next) in lines 5 and 13. That is, thescore-information is used to detect expected-performance-value for eachframe. Subsequently, like Pseudo-code 1 using sound-information,digital-sound-signals are input in line 2 and are divided in to aplurality of frames in line 3. The current-performance-value (current)and the previous-performance-value (prev) are set as NULL in line 4. Thecurrent-performance-value (current) corresponds to information of noteson the score corresponding to pitches contained in the current-frame ofthe digital-sound-signals, the previous-performance-value (prev)corresponds to information of notes on the score corresponding topitches included in the previous frame of the digital-sound-signals, andthe next-performance-value (next) corresponds to information of notes onthe score corresponding to pitches predicted to be included in the nextframe of the digital-sound-signals.

Thereafter, analysis is performed on all of the frames by repeating afor-loop in line 6 through line 39. Fourier transform is performed on acurrent-frame to detect frequency-components in line 7. It is determinedwhether performance proceeds to the next according to the score in lines9 through 16. In other words, if a new pitch which is not contained inthe current-performance-value (current) and theprevious-performance-value (prev) but is contained only in thenext-performance-value (next) is contained in the current-frame of thedigital-sound-signals, it is determined that performance has proceededto the next position in the score-information. Here, theprevious-performance-value (prev), the current-performance-value(current), and the next-performance-value (next) are appropriatelychanged. Among notes included in the previous-performance-value (prev),notes which are not included in the current frame of thedigital-sound-signals are found and removed from theprevious-performance-value (prev) in lines 17 through 21, therebynullifying pitches which are continued in the real performance but havepassed away in the score. It is determined whether each of the pieces ofsound-information (sound) contained in the current-performance-value(current) and the previous-performance-value (prev) is contained in thecurrent frame of the digital sound signals in lines 22 through 30. If itis determined that the corresponding sound-information (sound) is notcontained in the current frame of the digital sound signals, the factthat the performance is different from the score is stored as theresult. If it is determined that the sound-information (sound) iscontained in the current frame of the digital sound signals,sound-information (sound) is detected according to the strength of thesound contained in the current frame and pitch information, strengthinformation, and time information are stored. As described above, inlines 9 through 30, score information corresponding to the pitchesincluded in the current frame of the digital sound signals is set as thecurrent-performance-value (current), score-information corresponding topitches included in the previous frame of the digital-sound-signals isset as the previous-performance-value (prev), score-informationcorresponding to pitches predicted to be included in the next frame ofthe digital-sound-signals is set as the next-performance-value (next),the previous-performance-value (prev) and the current-performance-value(current) are set as expected-performance-value, and thedigital-sound-signals is analyzed based on notes corresponding to theexpected-performance-value, so analysis of the digital-sound-signals canbe performed very accurately and quickly.

Moreover, considering the case where music is differently performed fromthe score-information, line 31 is added. When peak frequency-componentsare left after analysis of pitches contained in the score-informationwas completed, the remained peak frequency-components correspond tonotes differently performed from the score-information. Accordingly, thenotes corresponding to the remained peak frequency-components aredetected using the algorithm of Pseudo-code 1 using sound-information,and the fact that the music is differently performed from the score isstored as in line 23 of Pseudo-code 2. For Pseudo-code 2, a method ofusing score-information has been mainly described, and other detaileddescriptions are omitted. Like a method using only sound-information,the method using sound-information and score-information can includelines 11 through 20 of Pseudo-code 1 in which the size of a unit framefor analysis is reduced in order to detect accurate time-information.

However, the result of analysis and the performance error as theresult-variable (result) are insufficient to be used as information ofactually performed music. For the same reason as described inPseudo-code 1, and considering that although different pitches start atthe same time according to the score-information, a very slight timedifference among the pitches can occur in actual performance, theresult-variable (result) is analyzed considering the characteristics ofa corresponding instrument and the characteristics of a player, and theresult of analysis is revised with (performance) in line 40.

Hereinafter, the frequency characteristics of digital-sounds andmusical-instrument sound-information will be described in detail.

FIG. 12 is a diagram of the result of analyzing the frequency-componentsof the acoustic-piano-sounds according to the first measure of the scoreshown in FIGS. 1 and 2. In other words, FIG. 12 is a spectrogram ofpiano sounds performed according to the first measure of the secondmovement in Beethoven's Piano Sonata No. 8. Here, a grand piano made bythe Young-chang piano company was used. A microphone was connected to anotebook computer made by Sony, and the sound was recorded using arecorder in a Windows auxiliary program. Freeware, a Spectrogram 5.1.6version, developed and published by R. S. Horne was used as a programfor analyzing and displaying the spectrogram. A scale was set to 90 dB,a time scale was set to 5 msec, a fast Fourier transform (FFT) size wasset to 8192, and default values are used for the others. Here, the scaleset to 90 dB indicates that sound of less than −90 dB is ignored and notdisplayed. The time scale set to 5 msec indicates that Fourier transformis performed with FFT windows overlapping every 5 msec to display animage.

A line 100 shown at the top of FIG. 12 indicates the strength of inputdigital sound signals. Below the line 100, frequency-componentscontained in the digital sound signals are displayed by frequencies. Adarker portion shows the magnitude of the frequency-component is lagerthan the bright ones. Accordingly, changes in the magnitude of theindividual frequency-components in the flow of time can be caught at aglance. Referring to FIGS. 12 and 2, it can be seen thatpitch-frequencies and harmonic-frequencies corresponding to theindividual notes shown in the score of FIG. 2 are shown in FIG. 12.

FIGS. 13A through 13G are diagrams of the results of analyzing thefrequency-components of the sounds of individual notes performed on thepiano, which are contained in the first measure of the score of FIG. 2.

Each of the notes contained in the first measure of FIG. 2 wasindependently performed and recorded in the same environment, and theresult of analyzing each recorded note was displayed as a spectrogram.In other words, FIGS. 13A through 13G are spectrograms of the pianosounds corresponding to the notes C4, A2♭, A3♭, E3♭, B3♭, D3♭, and G3,respectively. FIGS. 13A through 13G show the magnitudes of each offrequency-components for 4 seconds. The conditions of analysis were setto be the same as those in the case of FIG. 12. The note C4 has apitch-frequency of 262 Hz and harmonic-frequencies of n multiples of thepitch-frequency, for example, 523 Hz, 785 Hz, and 1047 Hz. This can beconfirmed in FIG. 13A. In other words, it shows thatfrequency-components of 262 Hz and 523 Hz are strong in near blackportions, and the magnitude roughly decreases from a frequency of 785 Hztoward a higher multiple harmonic-frequencies. The pitch-frequency andharmonic-frequencies of the note C4 are denoted by C4.

The note A2♭ has a pitch frequency of 104 Hz. Referring to FIG. 13B, theharmonic-frequencies of the note A2♭ is much stronger than its pitchfrequency. Referring to FIG. 13B only, because that the note A2♭'s3^(rd) harmonic-frequency 311 Hz is strongest among thefrequency-components displayed, this note A2♭ may be erroneouslyrecognized as the note E4♭ having pitch-frequency 311 Hz if the note isdetermined by order of the magnitude of frequency-components.

In addition, if the notes are determined by their magnitudes of thefrequency-components in FIGS. 13C through 13G, the same error can occur.

FIGS. 14A through 14G are diagrams of the results of indicating thefrequency-components of each of the notes contained in the first measureof the score of FIG. 2 on FIG. 12.

FIG. 14A shows the frequency-components of the note C4 shown in FIG. 13Aindicated on FIG. 12. Since the strength of the note C4 shown in FIG.13A is greater than that shown in FIG. 12, the harmonic-frequencies ofthe note C4 shown in the upper portion of FIG. 12 are vague or too weakto be identified. However, if the frequency-magnitudes of FIG. 13A arelowered to match the magnitude of the pitch-frequency of the note C4shown in FIG. 12 and compared with those of FIG. 12, it can be seen thatthe frequency-components of the note C4 are included in FIG. 12, asshown in FIG. 14A.

FIG. 14B shows the frequency-components of the note A2♭ shown in FIG.13B indicated on FIG. 12. Since the strength of the note A2♭ shown inFIG. 13B is greater than that shown in FIG. 12, the pitch-frequency andharmonic-frequencies of the note A2♭ are clearly shown in FIG. 13B butvaguely shown in FIG. 12, and particularly, higher harmonic-frequenciesare barely shown in the upper portion of FIG. 12. If thefrequency-magnitudes of FIG. 13B are lowered to match the magnitude ofthe pitch-frequency of the note A2♭ shown in FIG. 12 and compared withthose of FIG. 12, it can be seen that the frequency-components of thenote A2♭ are included in FIG. 12, as shown in FIG. 14B. In FIG. 14B, the5^(th) harmonic-frequency-component of the note A2♭ is strong because itoverlaps with the 2^(nd) harmonic-frequency-component of the note C4.That is, because the 5^(th) harmonic-frequency of the note A2♭ is 519 Hzand the 2^(nd) harmonic-frequency of the note C4 is 523 Hz, they overlapin the same frequency range in FIG. 14B. In addition, referring to FIG.14, the ranges of 5^(th), 10^(th), and 15^(th) harmonic-frequencies ofthe note A2♭ respectively overlap with the ranges of the 2^(nd), 4^(th),and 6^(th) harmonic-frequencies of the note C4, so the correspondingharmonic-frequencies show stronger than in FIG. 13B. (Here, consideringthe fact that weak sound is vaguely illustrated on a spectrogram, thesounds of individual notes were recorded at greater strengths than theactual performance as shown in FIG. 12 to obtain FIGS. 13A through 13Gso that frequency-components could be clearly distinguished from oneanother visually.)

FIG. 14C shows the frequency-components of the note A3♭ shown in FIG.13C indicated on FIG. 12. Since the strength of the note A3♭ shown inFIG. 13C is greater than that shown in FIG. 12, the frequency-componentsshown in FIG. 13C are expressed as stronger than in FIG. 14C. Unlike theabove-described notes, it is not easy to find only the components of thenote A3♭ in FIG. 14C because a lot of portions of thefrequency-components of the note A3♭ overlap with the pitch andharmonic-frequency-components of other notes and the note A3♭ was weaklyperformed for a while and disappeared while other notes werecontinuously performed. All of the frequency-components of the note A3♭overlap with harmonic-frequencies of the note A2♭ of multiples of 2. Inaddition, the 5^(th) harmonic-frequency of the note A3♭ overlaps withthe 4^(th) harmonic-frequency of the note C4, so it is difficult toidentify a discontinued portion between two portions of the note A3♭separately performed two times while the note C4 was continuouslyperformed. Nevertheless, other frequency-components become weaker in themiddle, so the harmonic-frequency-components of the note A2♭ and thediscontinued portion of the note A3♭ can be identified.

FIG. 14D shows the frequency-components of the note E3♭ shown in FIG.13D indicated on FIG. 12. Since the strength of the note E3♭ shown inFIG. 13D is greater than that shown in FIG. 12, the frequency-componentsshown in FIG. 13D are expressed as stronger than in FIG. 14D. The noteE3♭ was separately performed four times. For the time during which thenote E3♭ was performed first two times, the 2^(nd) and 4^(th)harmonic-frequency-components of the note E3♭ overlap with the 3^(rd)and 6^(th) harmonic-frequency-components of the note A2♭, soharmonic-frequency-components of the note A2♭ show in the discontinuedportion between the separate two portions of the note E3♭ performedseparately. In addition, the 5^(th) harmonic-frequency-component of thenote E3♭ overlaps with the 3^(rd) harmonic-frequency-component of thenote C4, so the frequency-components of the note E3♭ are continued inthe discontinued portion in the actual performance. For the time duringwhich the note E3♭ was performed next two times, the 3^(rd)harmonic-frequency-component of the note E3♭ overlaps with the 2^(nd)harmonic-frequency-component of the note B3♭, so the frequency-componentof the note E3♭ shows even while the note E3♭ is not actually performed.In addition, the 5^(th) harmonic-frequency-component of the note E3♭overlaps with the 4^(th) harmonic-frequency-component of the note G3, sothe 4^(th) harmonic-frequency-component of the notes G3 and the 5^(th)harmonic-frequency-component of the note E3♭ are continued even if thenotes G3 and E3♭ were alternately performed.

FIG. 14E shows the frequency-components of the note B3♭ shown in FIG.13E indicated on FIG. 12. Since the strength of the note B3♭ shown inFIG. 13D is a little greater than that shown in FIG. 12, thefrequency-components shown in FIG. 13E are expressed as stronger than inFIG. 14E. However, the frequency-components of the note B3♭ shown inFIG. 13E almost match those in FIG. 14E. As shown in FIG. 13E,harmonic-frequencies of the note B3♭ shown in the upper portion of FIG.13E become very weak showing vaguely, as the sound of the note B3♭becomes weaker. Similarly, in FIG. 14E, harmonic-frequencies shown inthe upper portion become weaker toward the right end.

FIG. 14F shows the frequency-components of the note D3♭ shown in FIG.13F indicated on FIG. 12. Since the strength of the note D3♭ shown inFIG. 13F is greater than that shown in FIG. 12, the frequency-componentsshown in FIG. 13F are expressed as stronger than in FIG. 14F. However,the frequency-components of the note D3♭ shown in FIG. 13F almost matchthose in FIG. 14F. Particularly, like FIG. 13F in which the 9^(th)harmonic-frequency of the note D3♭ is weaker than the 10^(th)harmonic-frequency of the note D3♭, the 9^(th) harmonic-frequency of thenote D3♭ is very weak and weaker than the 10^(th) harmonic-frequency ofthe note D3♭ in FIG. 14F. However, since the 5^(th) and 10^(th)harmonic-frequencies of the note D3♭ shown in FIG. 14F overlap with the3^(rd) and 6^(th) harmonic-frequencies of the note B3♭ shown in FIG.14E, the 5^(th) and 10^(th) harmonic-frequencies of the note D3♭ showstronger than the other harmonic-frequencies of the note D3♭. Since the5^(th) harmonic-frequency of the note D3♭ is 693 Hz, and the 3^(rd)harmonic-frequency of the note B3♭ is very close to 699 Hz, they overlapin a spectrogram.

FIG. 14G shows the frequency-components of the note G3 shown in FIG. 13Gindicated on FIG. 12. Since the strength of the note G3 shown in FIG.13G is a little greater than that shown in FIG. 12, thefrequency-components shown in FIG. 13G are expressed as stronger than inFIG. 14G. Since the note G3 shown in FIG. 14G was performed strongerthan the note A3♭ shown in FIG. 14C, each of the frequency-components ofthe note G3 could be found clearly. In addition, unlike FIGS. 14C and14F, the frequency-components of the note G3 rarely overlap withfrequency-components of the other notes, so each of thefrequency-components of the note G3 can be visually identified easily.However, although the 4^(th) harmonic-frequency of the note G3 and the5^(th) harmonic-frequency of the note E3♭ shown in FIG. 14D are similarat 784 Hz and 778 Hz, respectively, since the notes E3♭ and G3 areperformed at different time points, the 5^(th)harmonic-frequency-component of the note E3♭ shows a little below aportion between two separate portions of the 4^(th)harmonic-frequency-component of the note G3.

FIG. 15 is a diagram in which the frequencies shown in FIG. 12 arecompared with the frequency-components of the individual notes containedin the score of FIG. 2. In other words, the results of analyzing thefrequency-components shown in FIG. 12 are displayed in FIG. 15 so thatthe results can be understood at one sight. In the above-describedmethod for analyzing music according to the present invention, thefrequency-components of the individual notes shown in FIGS. 13A through13G are used to analyze the frequency-components shown in FIG. 12. As aresult, FIG. 15 can be obtained. A method of analyzing inputdigital-sounds using sound-information of musical-instrument accordingto the present invention can be summarized through FIG. 15. In otherwords, in the above-described method of the present invention, thesounds of individual notes actually performed are received, and thefrequency-components of the received sounds are used assound-information of musical-instrument.

It has been described that frequency-components are analyzed using FFT.However, it is apparent that wavelet or other techniques developed fromdigital signal processing algorithms instead of FFT can be used toanalyze frequency-components. In other words, a most representativeFourier transform technique is used in descriptive sense only, and thepresent invention is not restricted thereto.

Meanwhile, in FIGS. 14A through 15, time-information offrequency-components of the notes is different from that of actualperformance. Particularly, in FIG. 15, the notes start at 1500, 1501,1502, 1503, 1504, 1505, 1506, and 1507 in the actual performance, buttheir frequency-components show before the start-points. Moreover, thefrequency-components show after end-points of the actually performednotes. These timing-errors occur because the size of an FFT window isset to 8192 in order to accurately analyze frequency-componentsaccording to the flow of time. The range of timing-errors depends on thesize of an FFT window. In the above embodiment, the sampling rate is22050 Hz, and the FFT window is 8192 samples, so an error is8192÷22050≈0.37 seconds. In other words, when the size of the FFT windowincreases, the size of a unit frame also increases, thereby decreasing agap between identifiable frequencies. As a result, frequency-componentscan be accurately analyzed according to the pitches, but timing-errorsincrease. When the size of the FFT window decreases, a gap betweenidentifiable frequencies increases. As a result, notes close to eachother in a low frequency range cannot be distinguished from one another,but timing errors decrease. Alternatively, increasing the sampling ratecan decrease the range of timing-errors.

FIGS. 16A through 16D are diagrams of the results of analyzing notesperformed according to the first measure of the score shown in FIGS. 1and 2 using FFT windows of different sizes in order to explain changesin timing-errors according to changes in the size of an FFT window.

FIG. 16A shows the result of analysis in the case where the size of anFFT window is set to 4096 for FFT. FIG. 16B shows the result of analysisin the case where the size of an FFT window is set to 2048 for FFT. FIG.16C shows the result of analysis in the case where the size of an FFTwindow is set to 1024 for FFT. FIG. 16D shows the result of analysis inthe case where the size of an FFT window is set to 512 for FFT.

Meanwhile, FIG. 15 shows the result of analysis in the case where thesize of an FFT window is set to 8192 for FFT. Accordingly, by comparingthe results shown in FIGS. 15 through 16D, it can be inferred that a gapbetween identifiable frequencies becomes narrower to thus allow fineanalysis but a timing-error increases when the size of an FFT windowincreases, whereas a gap between identifiable frequencies becomes widerto thus make it difficult to perform fine analysis but a timing-errordecreases when the size of an FFT window decreases.

Therefore, when analysis is performed, the size of an FFT window can bechanged according to required time accuracy and required frequencyaccuracy. Alternatively, time-information and frequency-information canbe analyzed using FFT windows of different sizes.

FIGS. 17A and 17B show timing errors occurring during analysis ofdigital-sounds, which vary with the size of an FFT window. Here, a whitearea corresponds to an FFT window in which a particular note is found.In FIG. 17A, the size of an FFT window is large at 8192, so a white areacorresponding to a window in which the particular note is found is wide.In FIG. 17B, the size of an FFT window is small at 1024, so a white areacorresponding to a window in which the particular note is found isnarrow.

FIG. 17A is a diagram of the result of analyzing digital-sounds when thesize of an FFT window is set to 8192. Referring to FIG. 17A, the noteactually starts at a point 9780, but the note starts at a point 12288(=(8192+16384)/2) in the middle of the window in which the particularnote is found according to the result of FFT. Here, there occurs anerror of a time corresponding to 2508 samples, i.e., a differencebetween a 12288th sample and a 9780th sample. In other words, in thecase of sampling rate 22.5 KHz, an error of about 2508*(1/22500)=0.11seconds occurs.

FIG. 17B is a diagram of the result of analyzing digital-sounds when thesize of an FFT window is set to 1024. Referring to FIG. 17B, like FIG.17A, the note actually starts at a point 9780, but the note starts at apoint 9728 (=(9216+10240)/2) according to the result of FFT. Here, it isdetermined that the note starts at a time point corresponding to a9728th sample in the middle of the range between a 9216th sample and a10239th sample. An error is only a time corresponding to 52 samples. Inthe case of sampling rate 22.5 KHz, the error of about 0.002 secondsoccurs according to the above-described calculation method. Therefore,it can be inferred that the more accurate result of analysis can beobtained as the size of an FFT window decreases.

FIG. 18 is a diagram of the result of analyzing the frequency-componentsof the sounds obtained by putting together a plurality of pieces ofindividual pitches detected using the sound-information and thescore-information according to the second embodiment of the presentinvention. In other words, the score-information is detected form thescore shown in FIG. 1, and the sound-information described withreference to FIGS. 13A through 13G are used.

More specifically, it is detected from the score-information detectedfrom the score of FIG. 1 that the notes C4, A3♭, and A2♭ are initiallyperformed for 0.5 seconds. Sound information of the notes C4, A3♭, andA2♭ is detected from the information shown in FIGS. 13A through 13C.Input digital-sounds are analyzed using the selected score-informationand the selected sound-information. The result of analysis is shown inFIG. 18. Here, it can be found that a portion of FIG. 12 correspondingto the initial 0.5 seconds is almost the same as the correspondingportion of FIG. 14D. Accordingly, the portion of FIG. 18 correspondingto the initial 0.5 seconds, which corresponds to (result) or(performance) in Pseudo-code 2, is the same as the portion of FIG. 12corresponding to the initial 0.5 seconds.

While this invention has been particularly shown and described withreference to preferred embodiments thereof, it will be understood bythose skilled in the art that various changes may be made within thescope which does not beyond the essential characteristics of thisinvention. The above embodiments have been used in a descriptive senseonly and not for purpose of limitation. Therefore, it will be understoodthat the scope of the invention will be defined by the appended claims.

INDUSTRIAL APPLICABILITY

According to the present invention, input digital-sounds can be quicklyanalyzed using sound-information or both sound-information andscore-information. In conventional methods for analyzing digital-sounds,music composed of polyphonic-pitches, for example, piano music, cannotbe analyzed. However, according to the present invention, as well asmonophonic-pitches, polyphonic-pitches contained in digital-sounds canbe quickly and accurately analyzed using sound-information or bothsound-information and score-information.

Accordingly, the result of analyzing digital-sounds according to thepresent invention can be directly applied to an electronic-score, andperformance-information can be quantitatively detected using the resultof analysis. This result of analysis can be widely used in from musicaleducation for children to professional players' practice.

That is, by using a technique of the present invention allowing inputdigital-sounds to be analyzed in real time, positions of currentlyperformed notes on an electronic-score are recognized in real time andpositions of notes to be performed next are automatically indicated onthe electronic-score, so that players can concentrate on performancewithout caring about turning over the leaves of a paper-score.

In addition, the present invention compares performance-informationobtained as the result of analysis with previously storedscore-information to detect performance accuracy so that players can beinformed about wrong-performance. The detected performance accuracy canbe used as data by which a player's performance is evaluated.

1. A method for analyzing digital-sounds using sound-information ofmusical-instruments, the method comprising the steps of: (a) generatingand storing sound-information of different musical instruments; (b)selecting the sound-information of the particular instrument to beactually played from among the stored sound-information of differentmusical-instruments; (c) receiving digital-sound-signals; (d)decomposing the digital-sound-signals into frequency-components in unitsof frames; (e) comparing the frequency-components of thedigital-sound-signals with frequency-components of the selectedsound-information of the particular instrument and analyzing thefrequency-components of the digital-sound-signals to detectmonophonic-pitches-information from the digital-sound-signals; and (f)outputting the detected monophonic-pitches-information.
 2. The method ofclaim 1, wherein the step (e) comprises detecting time-information ofeach frame, comparing the frequency-components of thedigital-sound-signals with the frequency-components of the selectedsound-information of the particular instrument and analyzing thefrequency-components of the digital-sound-signals in units of frames,and detecting pitch-information, strength-information, andtime-information of each of individual pitches contained in each of theframes.
 3. The method of claim 1 or 2, wherein the step (e) comprisesthe steps of: (e1) selecting the lowest peak frequency-componentscontained in a current frame of the digital-sound-signals; (e2)detecting the sound-information containing the lowest peakfrequency-components from the selected sound-information of theparticular instrument; (e3) detecting, asmonophonic-pitches-information, the sound-information containing mostsimilar peak frequency-components to those of the current-frame fromamong the detected sound-information in step (e2); (e4) removing thefrequency-components of the sound-information detected as themonophonic-pitches-information in step (e3) from the current-frame; and(e5) repeating steps (e1) through (e4) when there are any peakfrequency-components left in the current-frame.
 4. The method of claim2, wherein the step (e) further comprises determining whether thedetected monophonic-pitches-information contains any new-pitch which isnot included in a previous-frame, dividing a current-frame including thenew-pitch into subframes if it is determined that the detectedmonophonic-pitches-information contains the new-pitch, finding asubframe including the new-pitch, and detecting pitch-information andstrength-information of the new-pitch and time-information of the foundsubframe.
 5. The method of claim 1, wherein the step (a) comprisesperiodically updating the sound-information of different musicalinstruments.
 6. The method of claim 1, wherein the step (a) comprisesstoring each individual pitch which can be expressed by thesound-information in the form of wave data when storing thesound-information of different musical instruments in the form ofsamples of sounds having at least one strength, and extracting thefrequency-components of the sound-information of different musicalinstruments from the wave data stored.
 7. The method of claim 1, whereinthe step (a) comprises storing each individual pitch which can beexpressed by the sound-information in a form which can directlyexpressing the magnitudes of each frequency-components of the pitch whenstoring the sound-information of different musical instruments in theform of samples of sounds having at least one strength.
 8. The method ofclaim 6 or 7, wherein the step (a) comprises separately storingsound-information of keyboard-instruments according to use/nonuse ofpedals.
 9. The method of claim 6 or 7, wherein the step (a) comprisesseparately storing sound-information of string-instruments by eachstring.
 10. The method of claim 7, wherein the step (a) comprisesperforming Fourier transform on the sound-information of differentmusical instruments and storing the sound-information in a form in whichthe sound-information can be directly displayed.
 11. The method of claim7, wherein the step (a) comprises performing wavelet transform on thesound-information of different musical instruments and storing thesound-information in a form in which the sound-information can bedirectly displayed.
 12. A method for analyzing digital-sounds usingsound-information of musical-instruments and score-information, themethod comprising the steps of: (a) generating and storingsound-information of different musical instruments; (b) generating andstoring score-information of a score to be performed; (c) selecting thesound-information of the particular instrument to be actually played andthe score-information of the score to be actually performed from amongthe stored sound-information of different musical instruments and thestored score-information; (d) receiving digital-sound-signals; (e)decomposing the digital-sound-signals into frequency-components in unitsof frames; (f) comparing the frequency-components of thedigital-sound-signals with frequency-components of the selectedsound-information of the particular instrument and the selectedscore-information, and analyzing the frequency-components of thedigital-sound-signals to detect performance-error-information andmonophonic-pitches-information from the digital-sound-signals; and (g)outputting the detected monophonic-pitches-information.
 13. The methodof claim 12, wherein the step (f) comprises detecting time-informationof each-frame, comparing the frequency-components of thedigital-sound-signals with the frequency-components of the selectedsound-information of the particular instrument and the selectedscore-information, analyzing the frequency-components of thedigital-sound-signals in units of frames, and detectingpitch-information, strength-information, and time-information of each ofindividual pitches contained in each of the frames.
 14. The method ofclaim 12 or 13, wherein the step (f) further comprises determiningwhether the detected monophonic-pitches-information contains anynew-pitch which is not included in a previous frame, dividing a currentframe including a new-pitch into subframes if it is determined that thedetected monophonic-pitches-information contains the new-pitch, findinga subframe including the new-pitch, and detecting pitch-information andstrength-information of the new-pitch and time-information of the foundsubframe.
 15. The method of claim 12 or 13, wherein the step (f)comprises the steps of: (f1) generating expected-performance-values ofthe current-frame referring to the score-information in real time; anddetermining whether there is any note in the expected-performance-valueswhich is not compared with the digital-sound-signals in thecurrent-frame; (f2) if it is determined that there is no note in theexpected-performance-value which is not compared with thedigital-sound-signals in the current-frame in step (f1), determiningwhether frequency-components of the digital-sound-signals in thecurrent-frame correspond to performance-error-information, detectingperformance-error-information and monophonic-pitches-information, andremoving the frequency-components of the sound-information correspondingto the performance-error-information and themonophonic-pitches-information from the digital-sound-signals in thecurrent-frame; (f3) If it is determined that there is any note in theexpected-performance-value which is not compared with thedigital-sound-signals in the current-frame in step (f1), comparing thedigital-sound-signals in the current-frame with theexpected-performance-values and analyzing to detectmonophonic-pitches-information from the digital-sound-signals in thecurrent-frame, and removing the frequency-components of thesound-information detected as the monophonic-pitches-information fromthe digital-sound-signals in the current-frame; and (f4) repeating steps(f1) through (f4) when there are any peak frequency-components left inthe current-frame of the digital-sound-signals.
 16. The method of claim15, wherein the step (f2) comprises the steps of: (f2_(—)1) selectingthe lowest peak frequency-components contained in the current-frame ofthe digital-sound-signals; (f2_(—)2) detecting the sound-informationcontaining the lowest peak frequency-components from the selectedsound-information of the particular instrument; (f2_(—)3) detecting, asperformance-error-information, the sound-information containing mostsimilar peak frequency-components to peak frequency-components of thecurrent-frame from the detected sound information; (f2_(—)4) if it isdetermined that the current pitches of the performance-error-informationare contained in next notes in the score-information, adding the currentpitches of the performance-error-information to theexpected-performance-value and moving the current pitches of theperformance-error-information into the monophonic-pitches-information;and (f2_(—)5) removing the frequency-components of the sound-informationdetected as the performance-error-information or themonophonic-pitches-information from the digital-sounds in thecurrent-frame.
 17. The method of claim 16, wherein the step (f2_(—)3)comprises detecting the pitch and strength of a corresponding performednote as the performance-error-information.
 18. The method of claim 16,wherein the step (f3_(—)3) comprises removing anexpected-performance-value corresponding to the selectedsound-information whose frequency-components are included in thedigital-sound-signals at one or more time points but are not included inat least a predetermined number (N) of consecutive previous frames. 19.The method of claim 15, wherein the step (f3) comprises the steps of:(f3_(—)1) selecting the sound-information of the lowest peakfrequency-components which is not compared with frequency-componentscontained in the current-frame of the digital-sound-signals from thesound-information corresponding to the expected-performance-value whichhas not undergone comparison; (f3_(—)2) if it is determined that thefrequency-components of the selected sound-information are included infrequency-components contained in the current-frame of thedigital-sound-signals, detecting the selected sound-information asmonophonic-pitches-information and removing the frequency-components ofthe selected sound-information from the current-frame of thedigital-sound-signals; and (f3_(—)3) if it is determined that thefrequency-components of the selected sound-information are not includedin the frequency-components contained in the current-frame of thedigital-sound-signals, adjusting the expected-performance-value.
 20. Themethod of claim 12, wherein the step (a) comprises periodically updatingthe sound-information of different musical instruments.
 21. The methodof claim 12, wherein the step (a) comprises storing each individualpitch which can be expressed by the sound-information in the form ofwave data when storing the sound-information of different musicalinstruments in the form of samples of sounds having at least onestrength.
 22. The method of claim 12, wherein the step (a) comprisesstoring each individual pitch which can be expressed by thesound-information in a form which can directly expressing the magnitudesof each frequency-components of the pitch when storing thesound-information of different musical instruments in the form ofsamples of sounds having at least one strength.
 23. The method of claim21 or 22, wherein the step (a) comprises separately storingsound-information of keyboard-instruments according to use/nonuse ofpedals.
 24. The method of claim 21 or 22, wherein the step (a) comprisesseparately storing sound-information of string-instruments by eachstring.
 25. The method of claim 22, wherein the step (a) comprisesperforming Fourier transform on the sound-information of differentmusical instruments and storing the sound-information in a form in whichthe sound-information can be directly displayed.
 26. The method of claim22, wherein the step (a) comprises performing wavelet transform on thesound-information of different musical instruments and storing thesound-information in a form in which the sound-information can bedirectly displayed.
 27. The method of claim 12, further comprising thestep of (h) estimating performance accuracy based on theperformance-error-information detected in step (f).
 28. The method ofclaim 12, further comprising the step of (i) adding the individual notesof the performance-error-information to the existing score-informationbased on the performance-error-information detected in step (f).
 29. Themethod of claim 12, wherein the step (b) comprises generating andstoring at least one kind of information selected from the groupconsisting of pitch-information, note-length-information,speed-information, tempo-information, note-strength-information,detailed performance-information including staccato, staccatissimo, andpralltriller, and discrimination-information for performance usingtwo-hands or performance using a plurality of instruments, based on thescore to be performed.